\section{Introduction}

Over the years the World Wide Web has grown from a small set of text-only web pages, 
to a rich media delivery platform for various organizations. Currently, it is 
necessary to have a website that not only allows the content to be delivered to the user, 
but presents it in a visually appealing way. Appearances matter and webpages are no exception.

Presentation of the content has been brought to the fore front with the advent
of the Web 2.0 revolution. The primary focus is still on the content, 
but the presentation of the content can be equally important. In a world where websites
are easy to build, website developers differentiate themselves from the
competition by focusing on the presentation of the data in a visually appealing
way. The focus on usability in the industry is higher than ever before.
\cite{harrisonmcknight2002iic} \cite{koufaris2004dit}

To keep up appearances for our websites, it is thus vitally important to keep
websites' presentation up to date. Presently to achieve this goal, people and
organisations hire domain experts (web page designers) or use templates which
can be viewed as a restricted form of a expert system who's knowledge base is
specified by a domain expert. However when there is limited availability of web
designers the problem is compounded. 

What we present in this paper is an approach that uses stochastic techniques 
to extract domain knowledge in the absence or limited availability of domain 
experts. We applied the approach to the domain of web page design in the hope to
test our hypothesis, is it possible to extract domain knowledge from the domain
in the absence of domain experts. We view the world wide web as a large
knowledge base which will provide a basis for training set. 

However what defines a good website is a subjective question. To be precise what
values are given to a specific parameter of a website is what is dependent on
the categorization of the web page. For example, load time, the time taken to
load a web page is expected to be low for most web pages. However web pages
which use multi-media rich features could be expected to have a slightly higher
load time. Therefore we re phrase the research question to the following,
Can we use stochastic techniques to classify web pages belonging to a particular
class of web pages.
